Rethinking formal models of partially observable multiagent decision making

نویسندگان

چکیده

Multiagent decision-making in partially observable environments is usually modelled as either an extensive-form game (EFG) theory or a stochastic (POSG) multiagent reinforcement learning (MARL). One issue with the current situation that while most practical problems can be both formalisms, relationship of two models unclear, which hinders transfer ideas between communities. A second EFGs have recently seen significant algorithmic progress, their classical formalization unsuitable for efficient presentation underlying ideas, such those around decomposition. To solve first issue, we introduce factored-observation games (FOSGs), minor modification POSG formalism distinguishes private and public observation thereby greatly simplifies remedy show FOSGs POSGs are naturally connected to EFGs: by “unrolling” FOSG into its tree form, obtain EFG. Conversely, any perfect-recall timeable EFG corresponds some this manner. Moreover, justifies several modifications appeared implicit response model's issues Finally, illustrate MARL presenting three key techniques – counterfactual regret minimization, sequence decomposition framework.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partially observable Markov decision processes

For reinforcement learning in environments in which an agent has access to a reliable state signal, methods based on the Markov decision process (MDP) have had many successes. In many problem domains, however, an agent suffers from limited sensing capabilities that preclude it from recovering a Markovian state signal from its perceptions. Extending the MDP framework, partially observable Markov...

متن کامل

Learning Partially Observable Models Using Temporally Abstract Decision Trees

This paper introduces timeline trees, which are partial models of partially observable environments. Timeline trees are given some specific predictions to make and learn a decision tree over history. The main idea of timeline trees is to use temporally abstract features to identify and split on features of key events, spread arbitrarily far apart in the past (whereas previous decision-tree-base...

متن کامل

Learning Partially Observable Action Models

In this paper we present tractable algorithms for learning a logical model of actions’ effects and preconditions in deterministic partially observable domains. These algorithms update a representation of the set of possible action models after every observation and action execution. We show that when actions are known to have no conditional effects, then the set of possible action models can be...

متن کامل

Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes

A fundamental problem faced by animals is learning to select actions based on noisy sensory information and incomplete knowledge of the world. It has been suggested that the brain engages in Bayesian inference during perception but how such probabilistic representations are used to select actions has remained unclear. Here we propose a neural model of action selection and decision making based ...

متن کامل

Bounded-Parameter Partially Observable Markov Decision Processes

The POMDP is considered as a powerful model for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model precisely the real-life situations, due to various reasons such as limited data for learning the model, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Artificial Intelligence

سال: 2022

ISSN: ['2633-1403']

DOI: https://doi.org/10.1016/j.artint.2021.103645